Design and evaluation of prosodically-sensitive concatenative units for a Korean TTS system
نویسنده
چکیده
This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Korean text-to-speech (TTS) synthesis system. The diphones used are prosodically conditioned in the sense that a single conventional diphone is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The four levels of the Korean prosodic hierarchy were observed in the diphone selection process, thereby selecting four different versions of each diphone: three edge diphones from the prosodic domains of the intonational phrase (IP), accentual phrase (AP) and prosodic word (PW), and a non-edge diphone from the domain of the prosodic word. Due to the size of the corpus that we employed, our system covers only 36.4% of the 6,503 possible diphones. A listening experiment designed to evaluate the quality of the diphone database showed that listeners preferred stimuli composed of prosodically appropriate diphones. We interpret this as supporting the view that segments carry prosodic domain information.
منابع مشابه
طراحی و ارزیابی یک مدل بازسازی گفتار به روش همگذاری واحدهای حساس به بافت نوایی
This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Persian text-to-speech (TTS) synthesis system. Thesyllables used are prosodically conditioned in the sense that a single conventional syllable is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The three levels of the Per...
متن کاملA concatenative Mandarin TTS system without prosody model and prosody modification
This paper proposes a two-step solution for generating natural prosody in TTS, in which no prosody prediction and modification are needed. A large phonetically and prosodically enriched speech corpus has been collected as the unit pool for the synthesizer. A multi-tier non-uniform unit selection scheme is developed to pick up the most suitable segments for concatenation from the unit pool. Fina...
متن کاملPerceptually based automatic prosody labeling and prosodically enriched unit selection improve concatenative text-to-speech synthesis
Prosody is an important factor in the quality of text-tospeech (TTS) synthesis. Typically, acoustic parameters such as f0 and duration are the only variables related to prosody that are used to determine unit selection. Our study explored adding the explicit use of linguistically and perceptually motivated prosodic categories in unit selection-based TTS. One of our goals was to automate the pro...
متن کاملData pruning using confidence measures for concatenative synthesis system built using automatically transcribed audio
Today, we can record and store large amounts of single speaker audio data, and also download it from the web. Generally, these data are prosodically rich and can therefore act as excellent candidates for building concatenative text-to-speech (TTS) systems. But transcritpions for these audio data are often not available and automatic transcriptions are error prone. In addition, these audio data ...
متن کاملVocalic sandwich, a unit designed for unit selection TTS
Unit selection text-to-speech systems currently produce very natural synthetic sentences by concatenating speech segments from a large database. Recently, increasing demand for designing high quality voices with less data creates need for further optimization of the textual corpus recorded by the speaker. The optimization process of this corpus is traditionally guided by the coverage rate of we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 22 شماره
صفحات -
تاریخ انتشار 2008